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1.  Introduction  and  review 

How  to  evaluate  from  observations,  all  subject  to  error,  an  estimate 
of  a  functional  relation  has  been  a  persistent  statistical  problem  for  80 
years.  When  the  relation  is  linear,  when  errors  of  observation  are  normally 
distributed,  and  when  either  nothing  is  postulated  about  distribution  of  the 
underlying  hypothetical  variables  or  they  are  assumed  to  be  also  normally 
distributed  random  variables,  a  consistent  solution  is  possible  only  if  the 
ratios  of  variances  and  covariances  of  the  errors  are  known  (or  alternatively 
all  but  one  of  the  second  moment  parameters  of  the  error  distributions).  We 
shall  consider  here  a  linear  relation  betvreen  two  variates  with  homogeneous 
error  variances.  Usually  one  assumes  the  covariance  to  be  zero  and  the 
ratio  of  variances  known.  With  that  condition  the  solution  was  given  by 
Kummell  in  1879  and  has  been  several  times  rediscovered.  Nevertheless  the 
theoretical  foundation  for  Kummell1  s  solution  has  remained  ambiguous;  it  is 
known  to  be  consistent,  but  beyond  that  its  statistical  properties  such  as 
bias  and  efficiency  have  not  been  investigated.  Miss  Dent  (1935)  seems  to 
have  been  the  only  writer  to  attempt  to  evaluate  the  sampling  variance  of  the 
estimated  slope  and  her  solution  is  far  from  satisfactory.  It  ignores  dis¬ 
tinction  between  parameters  and  statistics,  it  is  based  on  a  Taylor  expansion 
which  is  not  always  convergent,  and  as  it  seeks  the  variance  of  tan  (2(3)  it 
degenerates  toward  infinity  in  the  most  important  region  where  the  slope, 
tan  (3,  is  near  unity. 

The  purpose  of  this  paper  is  to  show  why  Kummell's  solution  is  unique; 
thence  to  prove  that  it  is  efficient  and  unbiased,  with  respect  to  the  angle 
of  the  line  with  either  coordinate  axis;  and  to  obtain  its  sampling  distri¬ 


bution. 
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Consider  the  following  model.  Two  variables,  ^  and  t|g,  are  linearly 
related! 


7i  ■  “ +  B  *k 

(1) 

or  cos  p  -  t^2  sin  p  -  *  -  0 

Experimentation  yields  paired  observations 

(2) 

y  .  =  b  .  +  6  . 

Jpi  /pi  pi 

p  8  1,  2;  i  =  1  ...  n 

The  errors  of  observation,  6p  dg,  are  assumed  to  be  random  variables 

2 

normally  independently  distributed  with  zero  means  and  common  variance  ctq. 
Except  where  otherwise  stated  nothing  is  postulated  about  the  distribution 
of  (or  equivalently,  owing  to  the  relation  (1),  of  The  model  is 

illustrated  in  fig.  1  where  circles  represent  equal  frequency  contours  of 
the  distribution  of  6^.  In  particular  we  will  consider  the  circles  with 
radius  equal  to  aQ.  Only  two  sub -populations  are  shown  in  the  figure  although 
usually  n  would  be  substantially  greater  than  two.  The  relationship  (1)  is 

t 

represented  by  the  line  A a  , 

We  shall  consider  also  the  more  general  model  where  errors  of  observations 
are  not  independently  distributed  with  equal  variances.  Let  the  observations 
then  be  denoted  x^,  and  assume  them  to  be  normally  distributed  with  variances 
ffu,  °22  an<*  covar^ance  ai2  ar-janc*  centers  ^  ^  which  obey  the  relation 

!l  -  A'  ♦  B'  %  (3) 

This  model  is  diagrammed  in  fig,  2,  We  consider  in  particular  the  contour 


ellipse 


°2281  “  2cT126162  +  °ils2  =  all°22  “  a12 
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where  =  x^  -  ^  •  The  length  of  any  radius  of  this  ellipse  is  the 
standard  deviation  of  any  section  of  the  bivariate  frequency  distribution 
in  the  same  direction;  and  the  projection  of  the  ellipse  on  any  line  is 
the  standard  deviation  of  the  marginal  distribution  of  the  sub-population 
projected  onto  that  line. 

Provided  we  know  either  two  of  the  quantities  cr^,  or  two  ratios  among 
them,  this  model  can  be  transformed  to  the  previous  one  by 


‘  aixi  +  Va  (5) 

y2  -  bj^  *  bjX, 


subject  to 


Vll  +  2al32a12  +  32a22  “  blall  +  2blVl2  +  b2c22 


alblall  +  ^alb2  +  a2bl^°12  +  a2b2°22  =  0 


a  transformation  which  can  be  made  in  many  ways.  All  deductions  about  the 
"y"  model  can  be  transferred  to  the  "x"  model  by  the  transformation. 

Efforts  to  find  a  means  of  estimating  the  functional  relation  (1)  or 
(3),  under  conditions  stated,  have  followed  one  of  three  lines,  namely: 

I  by  considering  criteria  of  consistency,  II  least  squares,  III  maximum 
likelihood,  (We  exclude  from  consideration  here  estimates  from  moments  of 
higher  than  second  order  which  become  available  when  ^  is  postulated  to  have 
a  non-normal  frequency  distribution.)  The  following  review  quotes  only  a 
few  of  the  papers  on  the  topic  to  indicate  salient  features  of  the  literature. 

I:  Criteria  of  consistency  are  exemplified  by  the  proposals  of  Gini  (1921), 
Seares  (19Ui,  19ii5)  and  Hald  (1952).  Usually  this  method  seeks  to  apply  an 
adjustment  to  the  regression  lines.  If  the  hypothetical  values  were  known 
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with  means  zero  we  would  have 

*-Sy/2/^ 

Consistent  estimators  for  the  numerator  and  denominator  are  J^y^  and 
*r  p  2 

<*iy2  -  noQ  respectively.  There  are  variants.  With  certain  assumptions 
the  Kummell  line  may  be  indicated  in  this  way  (Bindley,  191*7,  sec.  7*3)* 

II,  The  method  of  least  squares  is  the  commonest  approach.  The 
usual  idea  has  been  to  minimize  a  sum  of  squares  of  deviations  of 
observations  from  the  fitted  line.  The  problem  has  been  to  determine  in 
what  direction  should  the  deviations  be  measured, 

Adcock  (1878-79)  and  Pearson  (1901)  minimized  sum  of  squares  per¬ 
pendicular  to  the  line  without  attention  to  the  ratio  of  error  variances, 
Roos  (1937)  however  pointed  out  that  this  produces  a  solution  which  fails 
to  be  invariant  under  change  of  scale.  He  considered  that  the  direction 
in  which  deviations  are  measured  should  depend  only  on  the  precisions  of 
the  observations  and  be  independent  of  the  slope  of  the  line.  He  therefore 
proposed  to  use  deviations  at  b$°  to  either  coordinate  axis  when  variates 
are  scaled  so  as  to  have  equal  precisions.  But  Iindley  (191*7,  sec,  8,2) 
pointed  out  that  even  this  gives  consistent  estimates  only  under  rather 
special  conditions, 

Kummell  (1879)  and  Deming  (1931-1*3)  (assuming  "  0)  proposed  to 
minimize 

s  -  2  liilJif  *  '  hf 

|  all  °22 


-  5  - 


or,  equivalently,  a  proportionate  expression  using  only  the  ratio 
X  °  sut)ject  to  the  restriction  (3).  Kummell  showed  that  a 

solution  is  obtainable  only  if  X  be  known  and  that  it  is  equivalent  to 
minimizing  the  sum  of  squares  of  deviations  perpendicular  to  the  line  when 
the  variables  are  scaled  so  as  to  have  equal  error  variances,  that  is  when 
transformed  to  the  form  (1),  He  reached  the  well  known  solution  of  the 
quadratic  equation 

($11  -  XS22)B  +  (X  -  B2)S12  <=  0  (7) 

where  =  Jj  (x^  -  x^)  ,  etc. 


To  fit  curves  and  planes  he  proposed  an  approximate  method  which  shows 
interesting  variation  on  usual  procedure,  Mien  residuals  are  not  linear  in 
the  parameters  the  classical  least  squares  method  begins  by  expressing  the 
residuals  (before  squaring)  as  the  linear  terms  of  a  Taylor  expansion  in 
adjustments  to  trial  parameter  values.  Kummell  first  expresses  S  as  a 
Taylor  expansion  linear  only  in  the  deviations  (x^  -  ^  )  and  uses  this  to 
eliminate  the  ^  and  obtain  for  the  residuals  (e.g,:  x^  -  A*  -  B*x2) 
so-called  weights  which  are  usually  functions  of  the  parameters.  The  whole 
expression  S,  with  j  thus  eliminated,  is  then  expressed  as  a  Taylor  expansion 
in  adjustments  to  trial  parameter  values,  and  with  the  ’weights'  remaining  as 
functions  of  the  parameters  in  subsequent  differentiations.  (Roos  criticizes 
laxity  in  his  mathematical  arguments.) 

Deming’s  approximating  procedure  is  more  similar  to  the  usual  least 


squares  approximation  but  differs  in  that  the  expansion  of  the  residuals 


contains  simultaneously  terms  both  in 


and  in  parameter  adjustments. 


It  leads  to  'weighting'  of  the  observable  residuals  similar  to  Kummell' s 
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formulation,  but  with  the  weights  expressed  as  functions  of  the  trial 
parameter  values  and  hence  .entering  differentiation  as  constants.  In 
classical  procedure  thd  parameter  adjustments  tend  to  zero  as  iteration 
proceeds  with  convergence  on  the  required  estimates.  Deming's  expansion 
differs  in  the  material  respect  that  the  discrepancies  (x  -  ^  )  do  not 
tend  to  zero  and  cannot  be  made  to  do  so  without  losing  contact  with  the 
observations  which  must  remain  as  the  anchor  for  computations.  His  book 
repeatedly  reiterates  that  the  solutions  obtained  will  differ  from  the  true 
ones  only  in  squares  of  the  residuals,  but  it  evades  enunciating  the  corollary 
that  since  these  same  squares  of  residuals  constitute  the  function  on  whose 
minimization  the  solution  depends  they  are  not  negligible.  When  the  fitted 
relation  is  linear  the  procedure  leads  to  one  of  the  regression  lines  (albeit 
with  a  modified  estimate  of  the  error  variances).  In  this  case  it  fails  to 
distinguish  (as  it  appears  to  purport  to  do)  between  regression  and  the 
functional  relation,  and  the  proposed  weighting  procedure  is  wasted  effort. 
When  a  curved  relation  is  to  be  estimated  it  does  allow  that  observations 
in  regions  of  very  steep  slope  do  not  get  the  very  high  weights  which 
regression  would  in  effect  assign  to  them.  But  just  what  may  be  accomplished 
seems  not  to  have  been  threshed  out.  Recognition  of  the  order  of  magnitude 
of  neglected  terms  suggests  that  bias  may  still  be  about  as  great  as  by  any 
simpler  method ;  it  suggests  indeed  that  the  method  may  be  strictly  appropriate 
only  when  residuals  are  so  small  that  almost  any  method  of  fitting  will  yield 
a  satisfactory  result.  (A  similar  comment  was  made  by  J.  H.  Smith,  19k$>) 
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Lindley  (19U7 )  seeks  a  least  squares  solution  by  considering  the  squares 
of  residuals  defined  by  the  form  in  which  the  relation  happens  to  be  writtenj 
in  particular  using  (3)  he  writes  the  residuals  as 

(xx  -  a'  -  b'x2) 

With  assumptions  a-^/Q!22  ~  ^  *nc*  a±2  =  ^en  n0^es  ^at  the  variance 

of  these  residuals  is  proportional  to  (X  +  B  )  and  states  that  this  must 
be  introduced  as  a  weight  so  that  the  function  to  be  minimized  is 

(xx  -  A  -  B  Xg)  (8) 

X  +  B 


This  produces  Kummell's  equation^ but  to  imply  that  it  is  a  weighted  least 
squares  solution  is,  I  think,  misleading.  Weights  are  introduced  into  the 
least  squares  procedure  to  take  account  of  variation  in  precision  of  the 
observations.  (I  omit  these  here  for  simplicity.  They,  Lindley' s  P^,  Q^* 

are  easily  added  if  required.)  Here  all  the  observations  have  equal  weight, 

i2 

and  (X  +  B  )  is  not  a  weight  in  the  ordinary  sense.  Lindley  pertinently 
remarks  that  his  procedure  has  the  advantage  "that  the  redundant  )  are  never 
mentioned",  but  he  does  not  explain  why  they  can  be  thus  banished, 

III.  The  maximum  likelihood  solution  has  been  considered  by  Dent  (1935)* 
Lindley  (19U7 )  and  Kendall  (1950*  195k).  They  begin  by  writing  the  likelihood, 
assuming  3  0,  as 


(2n)"n(o'11cr22)”n//2 


(x^  -  A  -  B 
CT11 


and  treat  ^ 2^  as  parameters.  With  no  further  postulates  Lindley  and  Kendall 

conclude  that  all  is  not  well  with  the  resultant  equations  because  they  lead 

2 

to  the  ratio  of  estimates  of  being  (3  —  an  unacceptable  result. 
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While  admitting  the  result  to  be  unacceptable  it  is  scarcely  by  itself 
adequate  reason  for  rejecting  the  method  and  they  do  not  elucidate  vihy  this 
result  appears.  Since  the  likelihood  is  formulated  for  2n  observations 
indeterminance  is  not  due,  as  has  been  suggested,  to  trying  to  estimate  more 
parameters  than  there  are  observations.  We  shall  see  later  that  the  trouble 
is  that,  relative  to  distribution  of  the  only  deviations  which  are  observable, 
these  two  parameters  enter  as  a  single  unit. 

This  formulation,  with  the  ratio  °3j/°22  ^cnown>  *s  easily  seen  'to  be 
equivalent  to  Kummell * s  least  squares  formulation.  The  unpleasant  feature 
lies  in  regarding  ^  ^  as  parameters  to  be  estimated,  Neyman  and  Scott's  (19U7 ) 
"incidental  parameters".  The  word  "parameter"  as  used  in  statistics  has  not 
yet  been  very  specifically  defined.  Relative  to  the  theory  of  maximum  likeli¬ 
hood  it  may  be  defined  as  a  characteristic  constant  of  a  probability  distribu¬ 
tion.  To  specify  a  particular  parameter  we  must  be  able  to  specify  the 
population  of  random  variables  of  which  it  is  a  characteristic.  That  being 
done  it  is  at  least  theoretically  possible  to  return  again  and  again  to  re¬ 
sample  the  specified  population,  thereby  increasing  the  sample  size  from  which 
its  characters  may  be  estimated.  But  that  is  just  what,  in  the  problem  before 
us,  cannot  be  dene.  A  basic  feature  of  the  problem  is  that  we  never  know  and 
can  never  state  that  any  two  or  more  pairs  of  observations  are  drawn  from  the 
same  sub -population  with  a  particular  they  are  never  definable 

as  the  characters  of  a  specifiable  population  and  we  can  never  increase  sample 
sizes  for  their  estimation  as  is  required  to  demonstrate  the  optimum  properties 
of  maximum  likelihood  estimators.  Furthermore  we  are  not  interested  in  the 
j  individually,  we  are  concerned  only  to  estimate  the  line  as  a  whole.  The 
{  .  are  essentially  variables,  not  parameters  at  all.  They  may  or  may  not 


be  random  variables,  that  is  variables  with  an  associated  probability- 
distribution,  depending  on  the  procedure  by  which  observations  are  selected. 

If  they  are  random  variables  the  parameters  of  their  distribution  may  become 
parameters  also  of  the  overall  distribution  of  the  observable  x  and  appro¬ 
priate  formulation  is  self  evident  (appendix  2).  If  the  location  of  observa¬ 
tions  is  chosen  in  a  way  which  precludes  assigning  a  probability  distribution 
to  ^ ,  to  treat  them  as  parameters  may  yet  be  inappropriate,  and  we  should 
seek  some  other  method  to  eliminate  them  from  the  problem. 

2.  Least  squares  formulation 

The  method  of  least  squares  seeks  estimates  of  parameters  which  minimize 
a  sum  of  squares  of  residuals  which  are  usually  expressible  as 

-  e.  (10) 

where  x^^  are  observations,  and  9^  are  functions  of  the  estimandsand  known 
constants.  For  the  Gauss-Markoff  theorem  to  be  applicable,  with  consequent 
nice  statistical  properties  of  the  estimators,  it  is  necessary  (David  and 
Neyman,  1937)  that 

(i)  E(x.)  =  e. 

(ii)  &be  a  linear  function  of  the  estimands 

(iii)  the  relative  weights  of  x^  be  known. 

Little  seems  to  be  known  about  the  precise  statistical  properties  of  estimators 
when  these  conditions  are  not  met.  Conditions  (i)  and  (iii)  are  invariably 
assumed.  Failure  of  (ii)  creates  no  difficulty  in  principle  for  estimating 
the  parameters,  beyond  that  the  solution  may  have  to  be  approached  by  itera¬ 
tion;  but  estimators  may  no  longer  be  unbiased,  and  reliability  of  approxima¬ 
tions  to  their  variances  and  covariances,  based  on  linear  approximations, 


seems  uncertain 
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When  we  have  to  deal  with  a  relation  between  observations  all  of  which 
are  subject  to  error  we  do  not  obtain  a  clean  separation  between  observations 
and  estimands  as  at  (10).  We  must  deal  with  residuals  which  are  mixed 
functions  of  the  two,  for  example  from  (1)  or  (2); 

r  =  yx  -  a  -  By2  or  y1  cos  p  -  yg  sin  p  - 
This  additional  complication  has  been  slurred  over  by  writers  who  have  en¬ 
deavored  to  apply  the  principle  of  least  squares  to  such  situations. 

The  residuals  of  classical  least  squares,  and  relative  to  which  the 
principle  was  developed,  are  univariate  quantities.  The  residuals  with  which 
we  have  to  deal  are  compounded  of  two  variates,  but  once  compounded  in  a  de¬ 
fined  manner  the  compound  becomes  again  a  univariate  quantity  and  should  have 
a  univariate  probability  distribution.  The  only  quantities  on  which  inference 
must  be  based  are  deviations  from  observed  points  to  the  fitted  line;  there 
crc  n  such  quantities.  The  hypothetical  ^  or  rj  f  being  irrelevant  to  the 
problem,  if  they  can  be  eliminated,  the  deviations  must  be  defined  by  measure¬ 
ment  in  some  specified  direction. 

It  seems  natural  to  assume  as  a  first  requirement  that  the  residuals 
should  be  formulated  so  that  £(r)  =  0.  That  condition  is  satisfied  for  all 
formulations  which  have  been  proposed  when  expectation  is  taken  to  imply 
expectation  over  the  bivariate  distribution  of  y^  and  y2,  or  of  x^  and  x2 
in  each  sub-population.  But  after  noting  that  the  residuals  are  essentially 
univariate  quantities  it  seems  reasonable  to  consider  their  distribution  about 
the  line  as  conditioned  by  the  direction  in  which  it  has  been  decided  to 
measure  them  (of.  app.  1).  Suppose  one  may  decide  to  measure  the  deviations 

i 

parallel  to  some  arbitrary  direction  BB  ,  fig.  2.  The  mean  values  of  con¬ 
ditional  distributions  formed  by  sectioning  a  bivariate  distribution  in  that 
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direction  then  lie  on  a  diameter  CC  which  bisects  all  chords  of  the  contour 

i 

ellipse  which  are  parallel  to  BB  .  The  conditional  expectations  of  r  are 
then  quantities  QR  whose  magnitude  depends  on  the  distance  of  BB  from  jj,. 
The  net  effect  of  minimizing  a  sum  of  squares  of  such  deviations  is  to  bias 

i 

the  estimated  line  in  the  direction  CC  .  However  such  bias  can  be  eliminated 


if  the  deviations  be  measured  in  one,  and  only  one,  particular  direction.  If 
the  deviations  be  measured  in  a  direction  parallel  to  tangents  to  an  equi- 

i 

frequency  contour  ellipse  at  its  intersection  with  the  functional  line  A a  , 

i  t 

that  is  in  the  direction  DD  ,  fig.  2,  the  locus  of  expectations  CC  then 
coincides  with  AA*  and  the  expectations  of  deviations  (QR)  are  identically 
zero  independently  of  ^ •  ^en  deviations  are  so  measured  we  may  re- 

i 

gard  them  as  normally  distributed  about  the  line  Aa  ,  and  thus,  and  only  thus, 

the  incidental  variables  ^  can  be  eliminated  from  the  problem. 

When  transformed  to  our  "y"  model,  figure  1,  the  condition  is  seen  to 

be  when  deviations  for  independent  errors  of  equal  variance  are  measured 

perpendicular  to  the  line.  Furthermore  the  variance  of  such  deviations  is 

2 

immediately  seen  to  be  aQ.  In  figure  2  the  standard  deviation  is  equal  to 
the  length  of  the  radius  vector,  Td",  parallel  to  DD*,  for  the  ellipse  (H); 
but  the  expression  for  this  length  is  not  simple. 

3.  Maximum  likelihood  formulation 
For  simplicity  consider  the  standardized  "y"  model,  figure  1.  The 
deviations  to  be  considered  are  those  perpendicular  to  the  line.  From  any 
text  book  of  analytical  geometry  (or  from  elementary  trigonometry)  the 
distance  of  a  point  from  the  line  is 

(Y1  -  A  -  By2)(l  +  B2)‘l/2 


(ID 
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Ue  have  seen  that  these  deviations  are  normally  distributed  with  mean 
2 

zero  and  variance  a  .  The  likelihood  is  therefore 
o 


L 


/0  2-,-n/2 

(2nao)  '  exp 


1  -rj  (yl  -  A  “  By2^ 

2a2  1  +  B2 

o  -1 


(12) 


O  O 

It  is  maximized  when  >■  (y^  -  A  -  Byg)  /  (1  +  Bj  is  minimized.  This  is 

2 

Iindley's  formulation  but  we  now  see  that  the  factor  (1  +  B  )  (or,  more 
to . 

generally,  X  +  B  )  is  introduced  to  evaluate  the  only  deviation  from  whose 
distribution  the  redundant  ^  can  be  eliminated.  It  is  not  entered  as  a 
weight;  that  it  happens  to  be  proportional  to  the  variance  of  the  marginal 
distribution  when  the  whole  bivariate  distribution  is  projected  onto  a 
vertical,  in  which  direction  deviations  are  (y^  -  A  -  By^),  is  an  incidental 
circumstance  of  the  bivariate  normal  distribution.  Since  the  Kummell  so¬ 
lution  can  now  be  expressed  as  a  maximum  likelihood  estimator  not  involving 
"incidental  parameters",  it  follows  that  it  is  asymptotically  efficient. 

For  further  work  the  equation  of  the  functional  line  is  better  expressed 
in  the  intercept  form  (2).  In  regression  analysis  the  slope  of  a  regression 
is  normally  distributed  only  because  values  of  the  independent  variable  are 

taken  as  given  constants.  Furthermore  we  cannot  evaluate  a  regression  unless 

n  _  2 

these  have  some  spread,  that  is  (x  -  x)  cannot  be  zero,  and  the  estimated 
slope  cannot  approach  infinity.  In  the  problem  here  considered  B  may  approach 
infinity  from  either  direction  while  the  spread  of  is  still  substantial 
(depending  on  o^),  A  little  consideration  shows  that  the  system  is  circularly 
symmetric,  therefore  to  obtain  a  statistic  which  may  be  symmetrically  dis¬ 
tributed  and  independent  of  (3  we  should  consider  the  angle  of  the  estimated 
line,  rather  than  its  slope  whose  distribution  may  be  very  skew  and  dependent 
on  p. 
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Taking  the  parameters  to  be  estimated  as  8  and  the  log  likelihood 
is  now 


n  2  1  2 

InL  °  const.  -  -  In  a  -  — *■  >.  (y,cos  p  -  y9sin  p  -  «<)  (13) 

2  0  2a  1 

o 

To  derive  the  estimators  assume  that  y^,  y2  are  measured  from  their  means 


so  that  >  y  c  0. 

-  -Jp 


2  (yx  cos  p  -  yg  sin  p  -  .<) 


(Hi) 


whence  «<  *  0. 


^InL  =  1  2»  (yx  cos  p  -  y2  sin  p  -  -0(yx  sin  p  +  yg  cos  p)  (15) 

<>  P  7 


which  is  zero  when 


2  (~  (yj  “  sin  2p  +  yxy2  cos  2p)  =  0 


A 

tan  28 


2  y 


*1*2 


'^2  ^2 


(16) 


which  is  equivalent  to  Kunmell^  solution  (7). 

Further  deductions  are  easier  in  terms  of  transformed  variables  obtained 
by  rotating  to  coordinate  axes  parallel  and  perpendicular  to  the  theoretical 
line .  Define 


Ujl  ■  y,  cos  p  -  y2  sin  p  (17) 

u0  «=  y1  sin  p  +  y0  cos  p 

E(uJ  =  >|^  cos  p  -  i^2  sin  p  *  «<  (Constant  independently  of  i) 

E('J2i)  =  Yj1:i  sin  p  +  yj2±  cos  p  =  U.  say  (18) 


I 


I 


U  is  an  alternative  variable  to  ^  or  yg  an<*  represents  the  distance  of 
hypothetical  points  as  measured  along  the  line,  ST  in  figure  1.  We  then  have 

n  2  ^  “  -<)2 

InL  -  const.  -  -  In  cT  -  1  /'in^ 


<)lnL  n  .  *  (ul  “ 


a  0  whence2  (u^  -  »<)2/n  (20) 

Since  du^/dp  *>  -Ug,  and  Ug  is  a  constant  relative  to  sach  conditional 
distribution  along  a  fixed  line  DD*,  the  elements  of  the  information  matrix  are 


(Si 


f  -.&-i 
'  «)  7  7 


■rj  2 

Eo  /$M$  ■  -  ilg  - 

I  n  /  n  2  2 


T7 

6>cr 


D2lnL  n 

J(772  2? 


2lnL\4)lnL\  ^2lnL  .  -,u2 


°(c)^/Up/  o2 


E  .s& 
c>a2  Dp  D 


where  Eq  implies  conditional  expectation  given  Ug.  However  we  have  here  the 
peculiarity  that  as  p  is  varied  there  is  a  shift  in  the  directions  along  which 

A 

the  deviations  are  measured.  On  more  accurately  evaluating  var  (p)  it  will 
turn  out,  as  we  might  intuitively  expect,  that  a  better  approximation  to  the 
variance -covariance  matrix  is  given  by  replacing  Ug  by  U  wherever  it  occurs 
in  (21).  A  peculiar  feature  of  this  formulation  is  that,  allowing  iu  to  be  a 

f\  — \  A  £ 


variable,  E(ug^)  **  and  dUg/dp  »  u^,  we  obtain  -  E 


_  p  t)  InL 


-=-g-  as  required} 

(T 


but  the  alternative  form  E(~— )  etill  remains  as 

ap 


U2  +  no2 


The  product 


i 
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term  .  {~£  on  the  other  hand  has  the  same  expectation  as  - . 

U*  <)p  t)xc>p 

Pull  theoretical  implication  of  that  inconsistency  eludes  me;  it  is 

evidently  associated  with  the  condition  that  the  direction  in  which  the 

postulated  sampling  distribution  of  observations  is  defined  ’'wobbles" 

with  errors  in  estimating  its  parameters .  We  shall  see  later  that  the 

condition  under  which  the  indicated  variances  are  approached  is  not  n  -4  oo , 

but  that  the  ratio  (c  /  second  moment  of  U)  —>0. 

The  asymptotic  variance -covariance  matrix  is  therefore  indicated  to  be 

2 


i.£ 

n  A 


P 

-U 

A 


1 

A 


a 

0 


(22) 


! 

where  U  *  ^U/n, 

alternative  formulation  of  the  likelihood,  and  of  the  associated 
variance-covariance  matrix  of  the  estimators,  if  estimates  of  individual 
IL  may  be  deemed  relevant,  is  indicated  in  appendix  1.  The  likelihood 
formulation  if  U  is  also  a  normally  distributed  random  variable  is  stated 


n 


A  (U  -  U)2. 


in  appendix  2. 
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1|.  Summary  of  additional  results 

A  A 

The  distribution  of  p  is  symmetric.  By  approximating  (p-p )  by  a  power 
series  of  variables,  whose  joint  moment  generating  function  can  be  easily 
obtained,  it  has  been  ascertained  that  for  finite  samples  the  variance  of 
p  is 


p 

where  w  53  (n  -  l)a  /A. 

The  parameter  of  kurtosis  is 

V  .22.(2  ...fiL)  ♦  0 (A-2 ) 
^  A  lV 


(al) 


(a2) 


If 


ux  =  (yx  -  y2)  COS  p  -  (y2  -  y2)  sin  p 


then 


2 

E  ( u? )  =a^  n-2-w  +  -£L  (l-2w-w^)  -  (ll+l6w+9w^+3w^)  -  0(n*"^ )  I. 

1  -  n-1  (n-ir  J 


(a3) 


Since  (n-2 )  will  usually  be  large  relative  to  w  we  may  for  most  practical 
purposes  suppose  (n-2)  "degrees  of  freedom".  If  then  we  accept  as  estimator 
of  the  error  variance 


a2 

a 


•'T’A? 

~’U1 


/(n-2) 


its  mean  square  error  is 


If 


— -  -  hoop.  +  J>. 0(„-3) 

n-2  (n-2 ) 


sin  p  +  (y2-y2)  cos  p 


(all) 

(a*) 
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a  satisfactory  estimator  for  the  variance  of  p  is 
var(p)  ■  '3(1  +  (n-l)Q) 


<a6) 


where  Q 


^*2 

U1 


(n  -  2)  -n^uj 


E(var(8))  ■  ~  (1  +  w  +  -i-)  +  2-  (9  +  22w  -  8w2)  +  0 
A  n-1  a* 


nA 


(«7) 

(a8) 


which  compares  reasonably  favorably  with  the  value  required  for  an  unbiased 
estimator  as  indicated  by  (al). 

An  approximate  fiducial  interval  for  j3  is  given  by 


sin229 


a 

where  9  a  fi  -  P 


n-2 


¥2  -  So 

^7  *  ^0 


s 


p 


s 

c 


(yx  -  yx) (y2  -  y2) 


(a?) 


t  =  Student’s  t  for  the  required  fiducial  probability  and  (n-2)  degrees 
of  freedom. 


The  error  in  this  interval  seems  likely  to  be  negligible  for  the  kind  of  data 
to  which  such  lines  are  usually  fitted,  say  for  fiducial  intervals  of  less 
than  n/li. 
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I  have  been  unable  to  obtain  the  exact  sampling  distribution  of  6  under 
the  model  so  far  assumed.  But  if  we  add  the  postulate  that  U  is  normally- 
distributed  about  zero  with  variance  c^j,  then,  with  notation  as  follows 


p  o  p 

al  °  °  3  °o 

2  2,  v  2 

o2  -  a  (Vj)  ■  aD  + 

2 »  2 

Q  -  +  a2 

2  2  2 
C  =  cr^  cos  Q  +  a, 

n  is  even  =  2k  +  it 
the  probability  density  function 


2 

a. 


.  2 
sin  9 


of  8  3 


(p  -  P)  is 


p^"j /_  p\  \  op )  pi 

m  -  — '  3#r-  * 

Proof  of  these  statements,  and  the  appendices  noted  above,  will  be 


submitted  in  a  later  report. 
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